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Method And Apparatus For Discriminating Speech 
From Voice-Band Data In A Communication Network 

Background Of The Invention 

1. Technical Field 

This invention relates to the field of communications, and more 
particularly to a method and an apparatus for discriminating speech from 
5 voice-band data in a communication network. 

2. Description of Related Art 

It is well known that the ability to discriminate between speech and 
voice-band data (VBD) signals, e.g., originating from a modem or facsimile 
machine, in a communication network can improve network efficiency 

10 and/ or ensure Quality of Service requirements. For example, although 
channels of a conventional telephone network each carry 64 kbps, 
regardless of whether the channel is carrying speech or VBD, speech can 
be substantially compressed, e.g., to 8 kbps or 5.3 kbps, at an interface 
between the telephone network channel and a high-bandwidth integrated 

15 service communication system, such as at an ATM (Asynchronous Transfer 
Mode) trunking device or an IP- (Internet Protocol) telephone network 
gateway. Therefore, because the type of traffic received at such an interface 
device can dictate the signal processing performed, several techniques for 
discriminating between speech and VBD signals have previously been 

2 0 proposed. Such techniques conventionally rely on parameters such as 
zero-point crossing rates, signal extremas, high/ low frequency power rates, 
and/ or power variations between sequential signal segments to 
discriminate speech from VBD. 
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Although conventional techniques for discriminating between speech 
and VBD signals generally achieve low error rates for relatively low-speed 
VBD, the error rate for such techniques increases significantly for 
discrimination between speech and high-speed VBD transmissions, such 
5 as from V.32, V.32bis, V.34, and V.90 modems which utilize higher symbol 
rates and complex coding/modulation techniques and generate signals 
with many characteristics which are different than low-speed 
transmissions. For high-speed VBD, higher error rates occur because the 
distribution of many parameter values, such as zero-point crossing rates, 
10 signal extremas, and power variations, tend to overlap with corresponding 
speech parameter values. 

Summary Of The Invention 

The present invention is a method and an apparatus which 
15 accurately discriminates between speech and VBD in a communication 
network based on at least one of self similarity ratio (SSR) values, which 
indicate periodicity characteristics of an input signal segment, and 
autocorrelation coefficients, which indicate spectral characteristics of an 
input signal segment to generate a speech/ VBD discrimination result. 
2 0 Typically, voiced speech is characterized by relatively high energy- 

content and periodicity, i.e., "pitch", unvoiced speech exhibits little or no 
periodicity, and transition regions which occur between voiced and 
unvoiced speech regions often have characteristics of both voiced and 
unvoiced speech. During normal transmission, high-speed VBD is 
2 5 scrambled, encoded, and modulated, thereby appearing as noise with no 
periodicity. Some low-speed VBD signals, such as control signals used 
during a start-up procedure, exhibit periodicity. The present invention 
discriminates between periodic speech and VBD signals by recognizing 
that periodic VBD signals will typically have a faster repetition rate than 
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voiced speech, and calculating short-term delay and long-term delay SSR 
values to indicate the repetition rate of an input signal frame. 

The present invention also recognizes that analyzing the periodicity 
characteristics of an input frame may not ensure accurate speech/ VBD 
5 discrimination, and that the certain spectral characteristics of an input 
frame may reveal whether the input frame is speech or VBD. For example, 
the carrier frequency used by a typical modem/ fax is within a narrow 
range, whereas speech is a non-stationary random signal which typically 
exhibits large variations in its power spectrum. The present invention 
10 calculates short-term autocorrelation coefficients to determine the spectral 
envelope of an input frame to facilitate accurate speech/ VBD 
discrimination. 

According to one implementation of the present invention, the 
speech/ VBD discrimination technique of the present invention is 

15 implemented in a sequential decision logic algorithm which improves 
classification performance by recognizing that changes from speech to VBD 
or vice versa in a communication medium are unlikely. Therefore, after a 
predetermined number of frames have been classified as speech or VBD 
based on SSR values and/ or autocorrelation coefficients, the sequential 

2 0 decision logic algorithm enters a "speech state" or a "VBD state" in which 
the speech/ VBD discrimination output does not change unless a certain 
number of subsequent classification results indicate that the current 
decision state is erroneous. In one exemplary implementation of the 
present invention, the sequential decision logic algorithm discounts 

2 5 discrimination results for relatively low-power signal portions which are 
more susceptible to errors to further improve discrimination accuracy. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Other aspects and advantages of the present invention will become 
apparent from the following detailed description and accompanying 
drawings, where: 

5 Fig. 1 is a general block diagram of an apparatus for discriminating 

speech from VBD signals in accordance with one embodiment of the 
present invention; 

Fig. 2 is a flowchart illustrating speech/ VBD discrimination based 
on SSR values and autocorrelation coefficients according to an 

1 o embodiment of the present invention; and 

Figs. 3A-3C are flowcharts illustrating a sequential decision logic 
algorithm for classifying input signal segments as either speech or VBD in 
accordance with an embodiment of the present invention. 

15 DETAILED DESCRIPTION 

The present invention is a method and apparatus for accurately 
discriminating speech from VBD in a communication network. Fig. 1 is a 
general block diagram illustrating an exemplary speech/VBD discriminator 
100 in accordance with one embodiment of the present invention which 
20 may be implemented in a network interface device, such as an ATM 
trunking device or an IP- telephone network gateway. As shown in Fig. 1, 
the speech/VBD discriminator 100 includes an input frame buffer 110, a 
high-pass filter 120, and a speech/VBD discriminating unit 130. It should 
be recognized that, although the general block diagram of Fig. 1 illustrates 

2 5 a plurality of discrete components, the VBD/ discriminator 100 may be 

implemented in a variety of ways, such as in a software driven processor, 
e.g., a Digital Signal Processor (DSP), in programmable logic devices, in 
application specific integrated circuits, or in a combination of such devices. 
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The input frame buffer 110 receives an input signal, e.g., from a 
network line card which samples the signal from a conventional telephone 
network channel at an 8 kHz clock rate, to buffer frames of N consecutive 
speech samples per frame. Nominally, the input signal received by the 
5 input frame buffer has been sampled at an 8 kHz clock rate, frame size is 
in the range of 10 milliseconds (i.e., JV = 80 samples at a 8 kHz sampling 
rate) to 30 milliseconds (i.e., N = 240 samples at a 8 kHz sampling rate), 
and a 16-bit linear binary word represents the amplitude of an input 
sample (i.e., an input sample is no more than 2 15 ). The high-pass filter 120 

10 filters each frame of N samples to remove DC components therefrom. Input 
frames are high-pass filtered because DC signal components have little 
useful information for speech/ VBD discrimination, and may cause bias 
errors when computing the signal feature values discussed below. An 
exemplary filter transfer function represented in the z-transform domain, 

15 H(z), used by the high-pass filter 120 is represented as: 

1 Z 

128 

where zr 1 = erf*. The speech /VBD discriminating unit 130 receives the 
output of the high-pass filter 120, and performs speech/ VBD 
discrimination in a manner described in more detail below. 

2 o Typically, speech includes voiced regions, which are characterized by 

relatively high energy content and periodicity (commonly referred to as 
"pitch"), unvoiced regions which have little or no periodicity, and transition 
regions which occur between voiced and unvoiced speech regions and, 
thus, often have characteristics of both voiced and unvoiced speech. 

25 During normal transmission, high speed VBD is scrambled, encoded, and 
modulated, thereby appearing as noise with no periodicity. Some low speed 
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VBD signals, such as control signals used during a start-up 
procedure, exhibit periodicity. 

The present invention recognizes that VBD signals which exhibit 
periodicity will typically have a faster repetition rate than voiced speech, 
5 and also recognizes that certain spectral characteristics can also be 
effectively used to discriminate VBD from speech. For example, the carrier 
frequency used by a typical modem /fax is within a narrow range, e.g., 
between 1 kHz and 3 kHz, such that the power spectrum of a VBD signal 
is centered on the carrier frequency, e.g., typically centered above 1 kHz. 
10 On the other hand, speech is a non- stationary random signal which 
typically exhibits large power spectrum variations. The present invention 
calculates short-term autocorrelation coefficients to determine the spectral 
characteristics of an input signal to aid speech/ VBD discrimination. To 
enable speech/ VBD discrimination in accordance with these principles, 
15 the speech/ VBD discrimination unit 130 performs the calculations 
described below for each buffered and filtered frame of N samples. 

The speech/ VBD discriminating unit 130 calculates short- time 
power, Ps, of an input frame using a window of N samples by calculating: 



2 0 where n is the frame number, and x(i) is the amplitude of sample z. The 
speech/ VBD discriminating unit 130 also calculates SSR values to 
measure the similarity between sequential signal segments. More 
specifically, two separate SSR calculations are made for each frame to 
extract periodicity characteristics thereof. SSRlfn), representing SSR for a 

25 range of relatively small sample delays, is calculated as: 




(2) 



SSR, (n) = Max{COL(n, ;)}, 3 < j < 1 7, 



(3) 
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where j is the sample delay, and COL(nJ) is calculated as: 



n-N-\ 

COL(nJ) = -gpi (4) 

Y j x(i-j)-x(i-j) 

i=n(N-\) 



5 SSR2{n), representing SSR for a range of relatively large sample delays, is 
calculated as: 

SSR 2 (n) = Max{COL(n,j)}, 18< 7 < 143 (5) 

For voiced speech, the delay, i.e., the value of j, which results in the 
largest (max) SSR is the estimated pitch (or its multiple). The pitch of 

10 human voice is typically in the range of 2.225 milliseconds to 17.7 
milliseconds or 18-122 samples in an 8 kHz sampled signal. Therefore, if 
SSR2(n) is larger than a certain threshold, this tends to indicate that the 
corresponding frame is voiced speech. If SSRlfn) is a large value, however, 
the input signal frame may be a non-speech stationary signal with a high 

1 5 repetition rate . 

The speech/VBD discriminating unit 130 also calculates 
autocorrelation coefficients, which represent certain spectral 
characteristics of the frame of interest. Because an autocorrelation 
function of a signal is the inverse Fourier transform of its power spectrum, 

2 0 a short-term autocorrelation function, or low-delay autocorrelation 
coefficients, represents the spectral envelope of a frame. The present 
invention uses three autocorrelation coefficients, with 2, 3, and 4 sample 
delays respectively, to analyze spectral characteristics of a frame of 
interest. A normalized representation of autocorrelation for an input frame 

2 5 with a delay of k samples, Rkd(n), using a window of N consecutive 
samples, is represented by: 
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Rkd(n)= 1 ■ "J" ■*(/-*). (6) 

To establish a relationship between the power spectrum of a signal and 
autocorrelation coefficients, it can be assumed that the input signal is a 
single tone represented as: 
5 x(k) = A • sin(2 • n • / • k I f s + 0), (7) 

where f s = 8 kHz, and k = 0, 1,2, .... In this case, the autocorrelation 
coefficient with a delay of two samples, R2d, is: 

*2rf = cos(4 •*•///,). (8) 

From equation (8), it can be seen that R2d will be negative for 1 kHz 
10 < f < 3 kHz. Most VBD carrier frequencies lie in this range. If the input is 
a single tone, or a narrow-band signal with a power spectrum centered 
around 2 kHz, then R2d will be nearly - 1 . On the other hand, if the input 
signal is a tone or narrow band signal with a power spectrum centered 
around 0 kHz or 4 kHz, then R2d will be nearly + 1 . 
15 According to equation (7), R3d and R4d can respectively be 

calculated as follows: 

R3d = cos(6-7t'f/f s ); (9) 

R4d = cos($-x- f/f 5 ). (10) 

From equation (9), it can be seen that R3d is near -1 when the input 
2 0 signal is a narrow band signal with a power spectrum centered around 
1.33 kHz, near 4 kHz, or both. If R4d is near -1, then the input signal 
should be a narrow band signal with a power spectrum centered around 1 
kHz, 3 kHz, or both. Accordingly, R3d and R4d are effective parameters for 
discriminating single tone, multi-tone, and very low-speed VBD, i.e., such 
25 as used by many fax/ modem systems, from speech. 
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As one practical example, the V.21, 300bps, FSK duplex modem, uses 
different carrier frequencies (H, L) for different direction transmission. The 
lower channel, V.21 (L), has a nominal mean frequency of 1080Hz with 
frequency deviation of +/- 100Hz. From equation (10), such a transmission 
5 results in: 

/ = 1 1 80/fe : RAd = cos(% • 1 1 80 • n / 80000 ; = -0. 844 ; 

/ = 980Zfe: RAd = cos(%* 980^/ 80000,) = -0.998 . 

Therefore, an R4d value of a V.21 (L) signal will be less than -0.80. The 
higher channel, V.21 (H), has a nominal mean frequency of 1750Hz with 
10 frequency deviation of +/- 100Hz. From equation (8), R2d for a V.21 (H) 
signal will also be less than -0.8. 

As another example, the V.22, 600Hz symbol rate, QPSK/DPSK duplex 
modem uses a 1200Hz carrier for its lower channel, and a 2400Hz carrier 
and 1800Hz guard tone for its higher channel. For a V22 (L) signal, from 
15 equation (9), we have: 

/ = 1200 Hz, R3d = cos(6 * 1200 • n / 8000 ) = -0. 95 . 
Therefore, R3d will be near -1. R2d of V.22 (H) signal will also be less than 
-0.8. 

Fig. 2 illustrates an "raw decision" sequence for classifying a single 
20 input frame as being either speech or VBD using the calculated features 
discussed above. After calculating the Ps, SSR1, SSR2, R2d, R3d, and R4d 
values discussed above (step 150), the speech/ VBD discriminating unit 
130 initially attempts to classify the frame of interest as either speech or 
VBD based on R2d (step 152). Specifically, if R2d is less than or equal to a 
2 5 low threshold TR2L, e.g., TR2L =-0.75, the input frame is classified as 
VBD. If R2d is greater than or equal to a high threshold TR2H, e.g., TR2H 
= 0.55, the input frame is classified as speech. 

If R2d is between TR2L and TR2H, then the speech/VBD 
discriminating unit 130 next attempts to achieve a discrimination result 



9 



2925-494P 

Lucent Docket No. 120415/Zhang 1 

based on SSR1 (step 158). Specifically, if SSR1 is greater than or equal to 
a first similarity threshold TS1, e.g., TS1 = 0.96, the input frame is 
classified as VBD. If SSR1 is less than TS1, the speech/ VBD discriminating 
unit 130 next attempts to discriminate based on R3d and R4d (step 162). 
5 Specifically, the input frame is classified as VBD if R3d is less than or 
equal to a threshold TR3, e.g., TR3 = -0.8, if R4d is less than or equal to a 
threshold TR4, e.g., TR4 = -0.85, or if R3d + R4d is less than or equal to a 
threshold TR34, e.g., TR34= -1.37. 

If none of these conditions are met, the speech/ VBD discriminating 

10 unit 130 next attempts to discriminate based on SSR2 (step 166). 
Specifically, if SSR2 is greater than or equal to a threshold TS2, e.g., TS2 = 
0.51, the input frame is classified as speech. If SSR2 is less than TS2, the 
input frame is classified as VBD. 

Recognizing that once a frame is classified as speech or VBD, the 

15 next frame will probably have the same classification, the speech/ VBD 
discrimination technique described above is implemented in a sequential 
decision logic algorithm in accordance with one embodiment of the present 
invention to improve decision reliability. 

Figs. 3A-3C are flowcharts which illustrate an exemplary sequential 

2 0 decision logic algorithm implemented by the speech/ VBD discriminating 
unit 130 to discriminate speech and VBD. The sequential decision logic 
algorithm illustrated in Figs 3A-3C essentially has six states: (1) an 
initialization state; (2) a determination state in which individual input 
frames are classified as being either speech or VBD; (3) a speech state in 

2 5 which the classification result remains speech until subsequent 
classification results indicate that the speech state is erroneous; (4) a "was 
speech" state in which a period of low-power occurs after entering the 
speech state; (5) a VBD state in which the classification result remains 
VBD until subsequent classification results indicate the VBD state is 
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erroneous; and (6) a "was VBD" state in which a period of low-power 
occurs after entering the VBD state. The significance of these classification 
states will become more apparent from the following description. 

Referring to Fig. 3A, during an initialization step, each counter used 
5 in the sequential decision algorithm is set to 0 (step 202). Next, the 
discriminating unit 130 calculates Ps for a frame of interest (step 204) and 
determines whether Ps is greater than or equal to an energy threshold 
EThl (step 206). When Ps is less than EThl, the discriminating unit does 
not attempt to determine whether the frame is speech or VBD, and instead 

10 returns to step 204 to calculate the Ps for the next frame. In other words, 
the discriminating unit 130 does not initially attempt to classify input 
frames as speech or VBD until Ps reaches EThl. The sequential decision 
logic algorithm remains in an initialization state until Ps reaches EThl. 

When the discriminating unit 130 determines that Ps is greater than 

15 or equal to EThl, the sequential decision logic algorithm enters a 
determination state in which the speech/VBD discriminating unit 130 
calculates discrimination feature values for the frame of interest (step 208) 
and decides whether these discrimination feature values indicate that the 
frame of interest is speech or VBD (step 210). In other words, the 

2 0 discriminating unit 130 executes the raw decision logic discussed above 
with reference to Fig. 2 to classify the frame of interest as speech or VBD. 
When the frame of interest is classified as speech, a speech counter Spc is 
incremented by 1 (step 212), and Spc is compared to a speech count 
threshold Spy, e.g., Spy - 1 (step 214). If Spc is less than Spy, the 

2 5 sequential decision logic remains in the determination state and the 
discriminating unit 130 computes the discrimination feature values for the 
next input frame (step 208). If Spc is at least equal to Spy, the sequential 
decision logic enters the speech state, which is described below with 
reference to Fig. 3B. 
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If, at step 210, the input frame is classified as VBD, a VBD counter 
Mdc is incremented by 1 (step 216), and Mdc is compared to a VBD count 
threshold Mdy, e.g., Mdy = 4. If Mdc is less than Mdy, the sequential 
decision logic remains in the determination state, and the discriminating 
5 unit 130 computes the discrimination feature values for the next frame 
(step 208). If Mdc is at least equal to Mdy, the sequential decision logic 
enters the VBD state, which is discussed in detail below with reference to 
Fig. 3C. In accordance with the sequential decision logic shown in Fig. 3B, 
after a predetermined number of frames have been classified as 
10 speech/ VBD based on SSR and/ or autocorrelation coefficient values so 
that the sequential decision logic algorithm enters the speech/VBD state, 
speech/ VBD discrimination output does not change unless a certain 
number of subsequent classification results indicate that the speech/VBD 
state is erroneous. 

15 Referring to Fig. 3B, when the sequential decision logic enters the 

speech state (step 230), Ps is calculated for the next frame (step 204) and 
compared with the energy threshold EThl (step 234). If Ps is at least equal 
to EThl, a silence counter Sic is set equal to 0 (step 236), and the 
speech/VBD discriminating unit 130 calculates discrimination feature 

2 0 values for the next frame (step 238) so that the input frame can be 
classified as speech or VBD (step 240), i.e., "raw decision" is performed. If 
the input frame is classified as speech at step 240, the VBD counter Mdc is 
divided by 2 (step 242), the sequential decision logic remains in the speech 
state, and the classification sequence returns to step 232 so that the 

2 5 discriminating unit 130 calculates Ps for the next frame. If the input frame 
is recognized as VBD at step 240, the VBD counter Mdc is incremented by 
a "power-compensated" increment x (described in detail below) (step 244), 
and Mdc is compared with the VBD state-change threshold Mdx, e.g., Mdx 
= 8 (step 246). If Mdc is not at least equal to Mdx, the sequential decision 
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logic remains in the speech state, and the decision sequence returns to 
step 232 so that the speech/ VBD discriminating unit 130 calculates Ps for 
the next frame. When, however, Mdc is at least equal to Mdx, the VBD 
counter Mdc is reset to 0 (step 248), and the sequential decision logic 
5 switches to the VBD state. 

When the speech/ VBD discriminating unit 130 determines at step 
234 that Ps is less than EThl, the silence counter Sic is incremented by 1 
(step 250) and compared to a silence counter threshold Siy, e.g., Siy =8, 
(step 252). If Sic has not reached Siy, the sequential decision logic remains 

10 in the speech state, and proceeds to step 238 so that the discriminating 
unit 130 computes discrimination values for the frame of interest. When 
Sic reaches Siy, however, the sequential decision logic enters a "was 
speech" state which will next be described with reference to flow diagram 
blocks 253-257. During the "was speech" state, the discriminating unit 

15 130 initially calculates Ps for the next frame (step 253), and compares Ps 
with the energy threshold EThl (step 254). If Ps is greater than or equal to 
EThl, the silence counter Sic is reset to 0 (step 255) and the sequential 
decision logic returns to speech state step 238. When the discriminating 
unit 130 determines that Ps is less than EThl at step 254, the silence 

2 0 counter Sic is incremented by 1 (step 256) and Sic is compared to a second 
silence counter threshold Six (step 257), e.g., Six = 200. If Sic has not 
reached Six, the sequential decision logic remains in the "was speech" 
state, and Ps is calculated for the next frame at step 253. When Sic 
reaches Six, however, the sequential decision logic returns to its 

25 initialization state at step 202, i.e., reset occurs. 

Referring next to Fig. 3C, it can be seen that the sequential decision 
logic operates during the VBD state in a similar manner to the speech state 
described above with regard to Fig. 3B. Specifically, after entering the VBD 
state (step 260) based on the determination at step 218 or step 246, the 
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discriminating unit 130 calculates Ps for the next frame (step 262) and 
compares Ps with the energy threshold EThl (step 264). If Ps is greater 
than or equal to EThl, the silence counter Sic is set equal to 0 (step 266), 
and the discriminating unit 130 computes the discrimination feature 
5 values for the frame of interest (step 268) so that the discriminating unit 
130 determines whether the frame of interest is speech or VBD based on 
the "raw decision" logic of Fig. 2 (step 270). If the discriminating unit 130 
determines at step 270 that the frame of interest is VBD, the speech 
counter Spc is divided by two (step 272), the sequential decision logic 

10 remains in the VBD state, and Ps is calculated for the next frame (step 
262). If the discriminating unit 130 determines at step 270 that the frame 
of interest is speech, the speech counter Spc is incremented by a "power- 
compensated" increment x (step 274), and Spc is compared with a speech 
counter threshold Spx, e.g., Spx = 4 (step 276). If Spc is not at least equal 

15 to Spx, the sequential decision logic remains in the VBD state and returns 
to step 262 so that the discriminating unit 130 calculates Ps for the next 
frame. If Spc is determined to be at least equal to Spx at step 276, the 
speech counter Spc is reset to 0 (step 278) and the sequential decision logic 
enters the speech state discussed above with reference to Fig. 3B. 

2 0 When Ps is less than EThl at step 264, the silence counter Sic is 

incremented by 1 (step 280) and compared with the silence counter 
threshold Siy (step 282). If Sic is not at least equal to Siy, the sequential 
decision logic remains in the VBD state and proceeds to step 268 to 
compute discrimination feature values for the frame of interest. When, 

25 however, Sic reaches Siy at step 282, the sequential decision logic enters a 
"was VBD" state which is next described with reference to blocks 283-287 
shown in Fig. 3C. 

Specifically, the discriminating unit 130 calculates Ps for the next 
frame (step 283) and compares Ps with EThl (step 284). If Ps is greater 
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than or equal to EThl, the silence counter Sic is reset to 0 (step 285), and 
the sequential decision logic returns to step 268 of the VBD state to 
compute discrimination feature values for the frame of interest. When Ps is 
less than EThl at step 284, the silence counter Sic is incremented by 1 
5 (step 286) and Sic is compared with the second silence counter threshold 
Six (step 287). When Sic is determined to be less than Six at step 287, the 
sequential decision logic remains in the "was VBD" state and Ps is 
calculated for the next frame (step 283). When Sic reaches Six at step 287, 
however, the sequential decision logic returns to the initialization state of 
10 step 202. 

Regarding to the "power-compensated" increment x discussed above 
with reference to the speech state and VBD state decision logic, the present 
invention recognizes that discrimination between speech and VBD is more 
prone to errors for relatively low-power signal portions. For speech, a low- 

15 power signal portion may be unvoiced speech or gaps between speech. For 
VBD, a low-power portion may represent gaps between transmissions, or 
the waiting period during a handshake procedure. These signal portions 
are more prone to be influenced by noise and cross-talk because lower 
signal power results in a lower signal-to-noise ratio. Therefore, the "power 

2 0 compensated" increment x used to control when the sequential decision 
logic switches from the speech state to the VBD state, and vice versa, is a 
function of Ps. For a relatively low Ps, a small x is assigned. Otherwise, a 
larger x is used. Additional an adaptive power threshold, ETh2, is used to 
determine whether a relatively large or small value of x should be used. 

25 ETh2 is calculated as follows: 

/> max = max(a • P m Ps(n)) 

EM2 = fi-P„ (11) 
EThl € [Ebnd, Ebup], 
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where Ebup and Ebnd are the upper and lower boundaries of ETH2 
respectively. Ebnd can be as small as or a multiple of EThl, e.g., Ebnd = 
10 ■ EThl, and Ebup can, e.g., = 1.2 * 10 7 . The symbol a represents a 
constant which is near 1, e.g., a = 0.995, and jS is also a constant which 
5 can be between 1/50 to 1/10, e.g., j8 = 1/12. Pmax is the run-time 
estimation of the peak power of the signal. 

Using ETh2, the "power compensated" variable x can be determined 
as follows: 



10 IfPs<EThl:x = 0; 

Else ifPs <ETh2 : x = y 

Ekex = l, (12) 
where y is a constant in the range of [0.1, 0.5], e.g., y = 0.2. It should be 
realized that the evaluation criteria of the above-described discrimination 

15 technique can be altered for different applications. For example, some of 
the parameters discussed above can be adjusted depending on the 
requirements of the individual system, for example if the system requires a 
fast decision, or an extremely low misclassification ratio. 

The foregoing merely illustrates the principles of the invention. It will 

2 0 be appreciated that those skilled in the art will be able to devise various 
arrangements which, although not explicitly described or shown herein, 
embody the principles of the invention and are thus within the spirit and 
scope. 
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